-
Notifications
You must be signed in to change notification settings - Fork 72
ci: test gpu on self-hosted runners #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I see that not only the job that I'm migrating to self-hosted runners, but also
Last month they were still passing on the default branch: https://github.com/filecoin-project/rust-filecoin-proofs-api/actions/runs/15213629917 @vmx Do you have an idea why that might be? cc @BigLep |
Things should work, but the current master patches the rust-fil-proofs crates to point to the master branch: rust-filecoin-proofs-api/Cargo.toml Lines 32 to 36 in b06f9fb
As there was a release of rust-fil-proofs, this repo should be updated to use the released version. So I suggest that the current maintainers update to those versions and the we'll see if things still fail. |
I added a release commit here - c5246a9 - and the workflow now passes. Now, the question is, what the release process is that we should follow? I see that the previous tags were created manually - https://github.com/filecoin-project/rust-filecoin-proofs-api/tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the crate to v19.0.0, adds release notes, and configures the CI to run GPU tests on a self-hosted runner.
- Bump version and dependencies in Cargo.toml to 19.0.0
- Add a 19.0.0 release section and update link references in CHANGELOG.md
- Extend CI workflow to target a GPU-equipped self-hosted runner and install CUDA drivers
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
File | Description |
---|---|
Cargo.toml | Bumped package and filecoin-proofs dependencies to 19.0.0 |
CHANGELOG.md | Added 19.0.0 release notes and updated [Unreleased] and version links |
.github/workflows/ci.yml | Expanded triggers, switched to a self-hosted GPU runner, and added CUDA driver setup |
Comments suppressed due to low confidence (3)
.github/workflows/ci.yml:52
- Verify that the runner labels exactly match your self-hosted runner configuration; a mismatch in the label '2xlarge+gpu' will prevent the job from ever being picked up.
runs-on: ['self-hosted', 'linux', 'x64', '2xlarge+gpu']
.github/workflows/ci.yml:61
- [nitpick] Consider using the official NVIDIA CUDA GitHub Action or baking the drivers into the AMI (per your TODO) to reduce setup time and complexity in each workflow run.
curl -L -o nvidia-driver-local-repo-ubuntu2404-570.148.08_1.0-1_amd64.deb https://us.download.nvidia.com/tesla/570.148.08/nvidia-driver-local-repo-ubuntu2404-570.148.08_1.0-1_amd64.deb
CHANGELOG.md:10
- [nitpick] It may help consumers if you add a brief migration note or highlight any breaking changes introduced by the bumped dependencies alongside the release header.
## [19.0.0] - 2025-07-07
Thanks for the updates @galargh. @galargh : Would it maybe make sense to break this into two PRs (one for the release, and one for the CI adjustment)? @vmx : are there any steps we should follow for making releases (e.g., any |
The release commit in here had me confused, but I think we're just trying to do two separate things at once? I don't think I mind as long as it's not squash merged, but it would have been clearer for reviewing if they were separate PRs. My only question is about the nvidia drivers--are they already installed on the standard GitHub machines but not on the current AMI that we have access to? |
I would do the release separately. It's done similarly as for |
Related to filecoin-project/rust-fil-proofs#1775
Similar to filecoin-project/rust-fil-proofs#1785
This PR enables the job that requires running on a machine with a GPU. It will run on a
g6e.2xlarge
runner.